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Abstract 

New large deviations results that characterize the asymptotic information rates for general o?-dimensional 
(d-D) stationary Gaussian fields are obtained. By applying the general results to sensor nodes on a two- 
dimensional (2-D) lattice, the asymptotic behavior of ad hoc sensor networks deployed over correlated 
random fields for statistical inference is investigated. Under a 2-D hidden Gauss-Markov random field 
model with symmetric first order conditional autoregression and the assumption of no in-network data 



> 

' fusion, the behavior of the total obtainable information [nats] and energy efficiency [nats / J] defined as the 



ratio of total gathered information to the required energy is obtained as the coverage area, node density 



, and energy vary. When the sensor node density is fixed, the energy efficiency decreases to zero with rate 

. Q (area ^ and the per-node information under fixed per-node energy also diminishes to zero with rate 

0{N^ ) as the number Nt of network nodes increases by increasing the coverage area. As the sensor 
' spacing c?„ increases, the per-node information converges to its limit D with rate D — \/d^e~°"^" for a 

H : . . . 

5t 1 given diffusion rate a. When the coverage area is fixed and the node density increases, the per-node 

information is inversely proportional to the node density. As the total energy Et consumed in the network 
increases, the total information obtainable from the network is given by O {log Et) for the fixed node 
density and fixed coverage case and by (^E^^^^ for the fixed per-node sensing energy and fixed density 
and increasing coverage case. 
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Index Terms — Ad hoc sensor networks, large deviations principle, asymptotic KuUback-Leibler in- 
formation rate, asymptotic mutual information rate, stationary Gaussian fields, Gauss-Markov random 
fields, conditional autoregressive model. 

I. Introduction 

Sensor networks have drawn much attention in recent years because of their promising appli- 
cations such as scientific research, environmental monitoring, and surveillance [1]. In the design 
of sensor networks, there are several distinctive features. First, sensor networks are designed 
to sense and monitor various physical phenomena such as temperature, humidity, density of a 
certain gas or stress level of different locations in a structure. Many of these physical processes 
can be modelled as two-dimensional (2-D) random fields over a certain area, where the uncer- 
tainty of the underlying signal is captured as the randomness of samples and the proximity of 
samples close in location is modelled by the correlation among the samples. Second, sensors 
in different locations should be able to deliver the measured data to a control center (or fusion 
center) where the decision is made, and thus the communication capability is required as in ad 
hoc communication networks. Such communication functionality can be provided by networking 
sensor nodes, for example, using multi-hop routing. Third, energy is one of the critical issues 
in sensor network design since both sensing and communication require energy and it is difficult 
to recharge batteries in already deployed sensor nodes. Hence, it is of interest to design energy 
efficient sensor networks. 



Information 




Fig. 1 

Ad hoc SENSOR NETWORK OVER PHYSICAL PROCESS 
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In this paper, we consider the design of such sensor networks, and investigate the behavior and 
efficiency of these networks from an information-theoretic perspective. From the information- 
theoretic viewpoint, the process of sensing and communication mentioned above can be viewed 
as extracting information (about the underlying 2-D physical process) using imperfect sensor 
nodes by expending energy for statistical inference such as detection or reconstruction of the 
sensed signal field [2,3], as shown in Fig. 1. Relevant questions regarding the network design 
are as follows. How much information can one obtain from the network for given coverage 
and node density? How does the amount of gathered information change as we increase the 
coverage area or node density? How do the field correlation and measurement signal-to-noise 
(SNR) affect the amount of information obtainable from the network? What is the optimal node 
density? What are the information and energy trade-offs in such a sensor network with ad hoc 
routing? Answering these questions is difficult, especially, because of the 2-D spatial correlation 
structure of the signal process inherent to the two dimensionality of network deployment. To 
circumvent this problem, several studies based on one-dimensional (1-D) spatial signal models 
have been conducted (see, e.g., [2], [4], [5]). However, there is an important difference between 
1-D signal models and actual spatial signals. Suppose that we take observations from sensors 
located equidistantly along a line transect laid over an area. The observations may then be 
viewed as samples generated by a 1-D process along the line transect and results from time series 
analysis could be applied to examine their statistical properties. In the 2-D case, however, there 
is no natural notion of signal fiow or dependence direction along the transect as there is in a 
more traditionally obtained time series. For samples from sensors placed over a 2-D area, it is 
necessary to consider the signal dependence in all direction in the plane. 

A . The Approach and Summary of Results 

In this paper, we consider ad hoc sensor networks deployed for making statistical inferences 
about underlying 2-D random fields, and address the above questions in a general 2-D setting. 
In particular, we investigate the amount of information obtainable from the network and related 
trade-offs among information, coverage, density and energy in various asymptotic settings, and 
reveal the fundamental behavior of large scale planar ad hoc sensor networks. We model the 
signal field as a 2-D Gauss-Markov random field (GMRF), which is suitable for many physical 
processes, and consider the Kullback-Leibler information (KLI) and mutual information (MI) as 
our information measures [6,7]. Our approach for calculating the total obtainable information 
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is based on the large deviations principle (LDP). Under a stationarity assumption, the amount 
of information from a sensor node becomes independent of sensor location as the network size 
grows, and the total amount of information is approximately given by the product of the number 
of sensor nodes and the asymptotic information rate or asymptotic per-node information. (Thus, 
the units of these quantities is nats/node.) To quantify the information content, we first derive 
closed-form expressions for the asymptotic per-node KLI and MI for stationary Gaussian fields 
in a general d-dimensional (d-D) lattice in the spectral domain, and then apply these results 
to the 2-D case. We do so by exploiting the spectral structure of d-D stationary Gaussian 
signals and the relationship between the eigenvalues of the block circulant approximation to a 
block Toeplitz matrix describing the d-D correlation structure. However, the general expressions 
obtained in this way render the investigation of the field correlation and SNR difficult. To address 
this problem, we adopt the conditional autoregression (CAR) model, which is a generalization 
of the autoregressive (AR) model of classical time series analysis. We further investigate the 
properties of the asymptotic per-node KLI and MI as functions of the field correlation and the 
measurement SNR under the symmetric first order conditional autoregression (SFCAR) model, 
which captures the 2-D correlation on the plane effectively. In this case, the asymptotic per-node 
KLI and MI are given explicitly in terms of the SNR and the field correlation. The behavior of the 
asymptotic per-node KLI and MI as functions of correlation strength is seen to divide into two 
regions depending on the value of the SNR. At high SNR, uncorrelated observations maximize 
the per-node information for a given SNR, whereas there is non-zero optimal correlation at low 
SNR. Interestingly, it is seen that there is a discontinuity in the optimal correlation strength as a 
function of SNR. In the perfectly correlated case, the asymptotic per-node KLI and MI are zero 
as expected. As a function of SNR, the asymptotic per-node information increases as log SNR 
for a given correlation strength at high SNR. At low SNR, the two information measures show 
different rates of convergence to zero. 

Based on the derived expressions for asymptotic per-node information and their properties 
under the SFCAR and corresponding correlation function, we then investigate the fundamental 
behavior of large scale ad hoc sensor networks deployed over correlated random fields for statistical 
inference. Specifically, we examine the total information [nats] (about the underlying physical 
process) obtainable from the network and the energy efficiency [nats/ J] defined as the ratio of 
total gathered information to the required energy as the coverage, density and energy vary. We 
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assume that sensors are located on a 2-D lattice and all sensor nodes in the network deliver the 
measured data to a fusion center in the center of the 2-D lattice via minimum hop routing without 
in-network data fusion. Under these assumptions, we have the following results on the trade-offs 
among the information, coverage, density and energy, and the results provide guidelines for the 
design of sensor networks for statistical inference about many interesting physical processes that 
can be modelled as 2-D correlated random fields: 

(1) When the sensor node density is fixed, the amount of total information increases linearly 
with respect to (w.r.t.) the coverage area, and the energy efficiency decreases to zero with rate 
Q (area~^/^) as the coverage area increases. Further, in this case the amount of information per 
sensor node diminishes to zero as the network size grows with fixed energy per node. 

(2) As the sensor spacing dn increases, the per-node information converges to its limit D with 
rate D — \/d^e~"'^" for a given diffusion rate a. Hence, the per-node information saturates almost 
exponentially as we increase the sensor spacing. 

(3) When the coverage area is fixed and the node density increases, the per-node information 
is inversely proportional to the node density for any nontrivial diffusion rate. Hence, the total 
amount of information from a given area is upper bounded unless the random field is spatially 
white. 

(4) As the total energy Et consumed in the network increases, the total information obtainable 
from the network is given by (^E^^^^ for fixed node density and increasing coverage, whereas 
the total information increases only with rate of O (log Et) for fixed node density and fixed 
coverage. 

B. Related Work 

Large deviations analysis of Gaussian processes in Gaussian noise has been considered previ- 
ously, e.g., [8-13]. However, most work in this area considers only 1-D signals or time series. 
A closed-form expression for the asymptotic KLI rate was obtained and its properties were in- 
vestigated for 1-D hidden Gauss-Markov random processes in [12]. Large deviations analyses 
were used to examine the issues of optimal sensor density and optimal sampling in a 1-D signal 
model in [2] and [4] . For a 2-D setting, an error exponent was obtained for the detection of 2-D 
GMRFs in [14], where the sensors are located randomly and the Markov graph is based on the 
nearest neighbor dependency enabling a loop-free graph. Our work here focuses on the analysis 
of the fundamental behavior of 2-D sensor networks deployed for statistical inference via new 
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large deviations results for general d-D and 2-D stationary Gaussian random fields and their 
application to 2-D SFCAR GMRFs, which enable us to investigate the impact of field correla- 
tion and measurement SNR on the information and the fundamental behavior of ad hoc sensor 
networks for statistical inference with preliminary presentation of the work in [15]. 

C. Notation and Organization 

We will make use of standard notational conventions. Vectors and matrices are written in 
boldface with matrices in capitals. All vectors are column vectors. For a matrix A, A^ indicates 
the transpose and A{i,j) denotes the {i,j)-th element of A. We reserve irn fc*^ the identity 
matrix of size m (the subscript is included only when necessary). For a random vector x, ]Ej{x} 
is the expectation of x under probability density pj, j = 0, 1. The notation x M{p,, S) means 
that X is Gaussian distributed with mean vector /i and covariance matrix S. For a set A, \A\ 
denotes the cardinality of A. 

The paper is organized as follows. The background and signal model are described in Section 
II. In Section III, the closed-form expressions for the asymptotic KLI and MI rates are obtained 
in the spectral domain, and their properties are investigated as functions of the correlation and 
the SNR under the symmetric first order CAR model. The trade-offs related to ad hoc sensor 
networks deployed for statistical inference are presented in Section IV, followed by conclusions 
in Section V. 

II. Background and Signal Model 

We assume that sensors are distributed over a 2-D area and each sensor measures the underlying 
signal field at its location. To simplify the problem and gain insights into behavior in 2-D, we 
assume that sensors are located on a 2-D square lattice 

^n = {{i,j), i = 0, 1,- • • ,n - 1, and J = 0, 1,- • • ,n - 1}, (1) 

where the distance between two adjacent nodes (i, j) and {i + is dn, as shown in Fig. 2. (We 
will use ij to denote {i,j) when there is no ambiguity of notation.) We model the 2-D signal field 
{Xij,ij G In} (or simply {Xij}) sampled by sensors as a GMRF* w.r.t. an undirected graph in 
which a node corresponds to a sensor node or its signal sample. We assume that each sensor has 
*The Markov dependence structure may be restrictive. However, it is a meaningful model capturing 2-D spatial 
correlation structure and allowing further analysis. 
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Gaussian measurement noise. The noisy measurement Yij of Sensor ij on the 2-D lattice T„ is 
then given by 

Yij=Xij + Wij, ij€ln, (2) 

where {Wj^} represents independent and identically distributed (i.i.d.) AA(0,cr^) noise with a 
known variance o"^, and the GMRF {Xij} is assumed to be independent of the measurement 
noise {Wjj}. Thus, the observation samples form a 2-D hidden GMRFJ In the following, we 
briefly review results on GMRFs relevant to our further development. 

Definition 1 (Undirected graph) An undirected labelled graph ^ is a collection (J\f,£) of nodes 
and edges, where M = {1, 2, • • • , N} is the set of nodes in the graph, and £ is the set of edges 
{(/,m) : l,m & J\f and / ^ m}. There exists an undirected edge between two nodes / and m if 
and only if {l,m) £ £. 

We will use the terms node, sample and sensor interchangeably hereafter. 

Definition 2 (GMRF) A Gaussian random vector x = [Xi,X2, - ■ ■ ^Xj^Y" ^ with mean 
vector and covariance matrix S > is a GMRF w.r.t. a labelled graph Q = {J\f,£) if Xi and 
Xm are independent given X-im if and only if there exists no edge between nodes / and ?n, where 
X-lm = {Xk, k £ Af and k I, m}. 




measurement noise 

Fig. 2 

Sensors on a 2-D lattice: hidden Markov structure 

Note that a GMRF is defined using conditional independence on a graph. However, its distri- 
bution is easily characterized by the mean fi and the precision matrix Q (= S^^), and is given 

^In this paper, we focus primarily on the spatial correlation structure of 2-D sensor fields, and the signal evolution 
over time is not considered. 
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by 

p(x) = (2vr)-^/2|Q| 1/2 exp (-^(x - /x)^Q(x - a^)) , (3) 
and Qim / if and only if (/, m) £ £ for all / / m, i.e., 

Qlm = Xi ± Xjn\X_im- (4) 

Note that the covariance matrix S is completely dense in general while the precision matrix Q 
has nonzero elements Qim only when there is an edge between nodes I and m in the Markov 
random field. Hence, when the graph is not fully connected, the precision matrix is sparse [16]. 
The 2-D indexing scheme in (1) and (2) can properly be converted to a 1-D scheme to apply 
Definitions 1 and 2. From here on, we again use the 2-D indexing scheme for convenience. 

Definition 3 (Stationarity) A GMRF {Xij} on a 2-D infinite lattice loo is said to be (second 
order) stationary if the mean vector is constant and the covariance between samples Xij and 
Xi'j' depends only on the difference of the node index, i.e., 

Coy{Xij,Xi,,,) = E{{Xij - f,){Xef - /i)} = c{i - i',j - f) 

for some function c(-, •), where fj, is the mean of the stationary field. 

Without loss of generality, we assume that the signal GMRF {Xij} is zero-mean. For a 2-D 
zero- mean and stationary GMRF {Xij}, the covariance {"fij} is defined as 

lij = = E{XooXij}, (5) 

which does not depend on i' or j' due to the stationarity. The spectral density function of a 
stationary GMRF {Xij} on loo with covariance -jij is defined as 

f{couco2) = tAi E 7^,e-'(^"^+^■^^^ (6) 

where l = \/— T and (tiJi,u;2) G [— vTjTt)^. Note that (6) is a 2-D extension of the conventional 
1-D Fourier transform. We can express {7ij} from the spectral density function via the inverse 
transform 

lij = r r /(u;i,t^2)e'(^"i+^'"^)dc^i(ia;2. (7) 

J —TV J —TV 

''Of course, if a stationary GMRF has a known and non-zero mean, the known mean can be subtracted to yield 
a zero-mean field. 
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A stationary GMRF can be implicitly specified by a conditional autoregressive (CAR) model, 
which is a natural generalization of the autoregressive (AR) model arising in 1-D time series and 
which provides an efficient tool for capturing the spatial correlation structure of the sensor field 
considered here. 

Definition ^ (The conditional autoregression [16]) A zero-mean CAR GMRF is defined by a 
set of full conditional normal distributions with mean and precision: 

¥.{Xij\X_ij} = — — "S^ OiijiXi^iij^ji, (8) 

and 

E-\xfj\X_^j} = 000 > 0, (9) 
where X-ij denotes the set of all variables except Xij. 

Note in (8) that the the conditional mean of Xij given all other node variables depends on nodes 
{i + i',j + j') such that Oi'j' ^ 0, and the relationship between the CAR model of (8) and (9) 
and the precision matrix is given by 

Q(ij),(*+j'j+i') = ^i'j'- (^0) 

Hence, the Markov dependence structure on the graph is easily captured by the CAR model 
through (4), and {Oi/j/} directly represent the connectivity of the Markov graph. 

Theorem 1 (Spectrum of a CAR model [16]) The GMRF defined by the CAR model of (8) 
and (9) is a zero-mean stationary Gaussian process on 1^0 with the spectral density function 

f{u)i,U)2) = J — — — --, (11) 

V''^) LijGj^ % exp(-/,(ia;i + juj2)) 

if 

|{%/0}|<oo, % = e„i,_j, ^00 >0, (12) 

and 

{9ij} is such that f{uJi,uj2) > 0, V(i^i,a;2) G [— 7r,vr)^. (13) 

Henceforth, we assume that the 2-D stochastic signal {Xij} in (2) is given by a stationary GMRF 
defined by the CAR model of (8) and (9) satisfying (12) and (13) as n — > cxd. 
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The SNR of the observation Yij in (2) is well defined due to the stationarity as n ^ oo, and 
is given by 



SNR=^^77T77fT = ^' V^i' (14) 



nx^} _ P 

where the signal power is constant over G 1^ and is given, using the inverse Fourier 

transform of (6), by 

/TT /"TT 
/ f{LVl,UJ2)duJlduJ2. (15) 
-TT J — 7r 

III. Asymptotic Information Rates: Closed-Form Expressions and Impact of 

Correlation and Signal-to-Noise Ratio 

In this section, we derive closed- form expressions for the asymptotic KLI and MI rates under 
the 2-D CAR GMRF model discussed in the previous section. We further investigate the proper- 
ties of the asymptotic information rates under a symmetric correlation assumption. For the MI, 
the signal model (2) is directly applicable, whereas for the KLI the probability density functions 
of the null (noise-only) and alternative (signal-plus-noise) distributions are given by 

Po{Yij) : Yij = Wij, ij£ln, and (16) 
pi{Yij) : Y,j=X,j+W^j, ijein, (17) 

respectively. The asymptotic KLI rate % is defined as 

X= lim - — -log— {{Yij, ij In}) almost surely (a.s.) under (18) 

n~*oo \Xn\ Pi 

where po and pi are given by (16) and (17), respectively. Under a Neyman-Pearson detection 
formulation, the miss probability Pm decays exponentially in many cases, including (16) and 
(17), and the error exponent is defined as the exponential decay rate 

hm --^logPA^, (19) 

where \In\ is the total number of samples in I^- It is known that the error exponent is given by 
the asymptotic KLI rate % defined in (18) in this case [17]. Hence, a larger KLI rate (or per-node 
KLI) implies better detection performance with a given network size, or a smaller network size 
required for a given level of performance. 

While the asymptotic KLI rate determines the error exponent for Neyman-Pearson detection, 
the asymptotic MI rate is interpreted as the amount of uncertainty reduction about the hid- 
den signal field resulting from one observation sample, in the large sample size regime. The 
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asymptotic MI rate J is given by 

J = lim -^I{{Xij,ij G {Yij, ij G In}), 

n-^oo LZ,„ 

= lim -^[Hi{Xij,ij E J„}) - H{{Xi,,ij G J„}|{yy , G J„})]. (20) 

It is shown in the sequel that the asymptotic KLI rate is smaher than the asymptotic MI rate 
and that the two information measures converge when SNR increases. Thus, at high SNR the 
two information measures are equivalent. 

A. Asymptotic Information Rates in General d-Dimension 

While the 2-D results are relevant to our analysis of fundamental trade-offs in planar sensor 
networks, it is of theoretical interest to investigate the statistical properties of stationary Gaussian 
random fields in general higher dimension. In this section, we first derive closed-form expressions 
for the asymptotic KLI and MI rates for stationary Gaussian random fields in d-D, and then 
apply the results to the 2-D case. For a stationary d-D Gaussian random field {Y\, i G Z'^}, 
where Z is the set of all integers, the autocovariance function under pi is given by 

7h = Ei{yiyi+h}, h = (/ii,/i2,---,M GZ^ (21) 

and the corresponding Fourier transform (i.e., the power spectral density) and its inverse are 
given by 



and 



(2t:Y " '""^ - - ^ , , >22) 



1^ = j e^h-^/i(^)du;, (23) 

respectively, where the integration is over u G [— vr, vr)'^, and h • u denotes the inner product 
between h and oj. Note that (21), (22) and (23) are the extensions of (5), (6) and (7), respectively, 
to d-D. The null and alternative distributions arising in the KLI in d-D are given by 

po(yi) : Y, = W,, ieVn, 



(24) 



where {Wi} are i.i.d. Gaussian from M{0, cr^), {y/^^} is a stationary d-T) Gaussian random field 
with spectrum /i(a^)^, and 

P„ = [0,l,---,n-l]'^. (25) 
^Note that {K'^'} need not be a hidden Markov field. 
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Based on the previous work [18], we further exploit the relationship between the eigenvalues 
of block circulant and block Toeplitz matrices representing correlation structure in d-D and the 
i.i.d. null distribution, and obtain the KLI for (24) given by the following theorem. 

Theorem 2 (Asymptotic KLI rate in d-D) Suppose that 
A.l the alternative spectrum fi{ij^) has a positive lower bound, and 
A.2^M <oo such that V A; = 1, 2, • • • , d, Ehez^l + I^fcl)l7h| < M. 
Then, the asymptotic KLI rate % for (24) is given by 



where -D(-||-) denotes the Kullback-Leibler distance. 
Proof: See Appendix I. 

Theorem 2 is an extension to general d-D of the asymptotic KLI rate in 1-D obtained in [12], 
and shows that the frequency binning interpretation of (27) holds in the general fi-D case under 
some regularity conditions on the alternative spectrum. Note that the integrand in (27) is the 
Kullback-Leibler information between two zero-mean Gaussian distributions with variances cr^ 
and (27r)°'/i (a;), respectively. For each d-D frequency segment du, the spectra can be thought 
of as being flat, i.e., the signals are independent, and Stein's lemma [19] can be applied for the 
segment. The overall KLI is the sum of contributions from each bin. The smoothness of the 
spectrum /i(cj) is a sufficient condition for Assumption A. 2 for second-order stationary fields, 
and thus the frequency binning in Theorem 2 is valid for a wide class of spectra. Theorem 2 
follows from the fact that % is given by the almost-sure limit of the normalized log-likelihood 
ratio in (18) and that we have Gaussian distributions for pQ and pi. That is, % is given by the 
almost sure limit 




(26) 



(27) 




where y[x)„| is a vector consisting of \T)n\ observation samples {Yj, i G T>n} with elements arranged 
in lexicographic order; for example, in 2-D 



yix, 



■n 



[yi,--- 



y\Xr.\V = [>bo,ilo,--- 



^n-1,0) ^01 




T 



(29) 



March 9, 2009 



DRAFT 



TO APPEAR IN IEEE TRANS. ON INFORMATION THEORY, JUNE 2009 13 

and Slojx'nl ^-i^d 5]x are the covariance matrices of yp^j under po ^-nd pi, respectively. Note 
that the log-hkelihood ratio in (28) consists of two terms: one is a deterministic term and the 
other is a quadratic random term. The overall convergence follows from the convergence of each 
of the two terms. Note that the deterministic term in (28) is simply the mutual information 
between {Xi, i G Vn\ and {Yl, i G P„} for the model 

Y; = X; + W;, ieVn. (30) 

Using the convergence of the first term in the right-handed side (RHS) of (28), the asymptotic 
MI rate J for d-D is given by 

(21)'' 2 <72 • ' 

where f{u:) is the spectrum of the signal {-^i}. This is simply a d-D extension of the 1-D MI 
rate in spectral form [20], and shows the validity of the log (1+ SNR) formula and frequency 
binning approach in general d-D under some regularity conditions on the spectrum; a sufficient 
condition is provided in Theorem 2. 

Applying the d-D results to the 2-D hidden GMRF model of (16) and (17), we have the 
following corollary for 2-D. 

Corollary 1 (Asymptotic information rates in 2-D) Assuming that the conditions (12) and (13) 
hold, the asymptotic KLI and MI rates for the hidden CAR GMRF model with (16) and (17) 
are given by 



1 a"^ + 47r2/(wi,a;2) 1 ( ^ a 



- los ■ 1 

2 ^ cj2 2V C72 + 47r2/(wi,C^2 



,2 



duJiduj2, (32) 



and 



- log dc.,du.,, (33) 



where f{uji,uj2) is the 2-D spectrum of the signal GMRF {Xij,ij £ Too} defined in (11). 
Proof: See Appendix I. 

Comparing (32) and (33), we note that the asymptotic KLI rate is strictly less than the 
asymptotic MI rate for any positive signal spectrum, and that the two information measures 

2 

converge with a fixed offset of -1/2 as the SNR increases without bound since ■> , , ■?(/ v 

in (32) as SNR — > oo. Hence, the two information measures can be equivalently used at high 
SNR. 
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B. Symmetric First-Order Conditional Autoregression 

In the previous section, we have derived closed-form expressions for the asymptotic KLI and MI 
rates for hidden CAR GMRFs with general 2-D spectra defined in (11) in the spectral domain. 
However, these general spectral expressions render further analysis infeasible. To investigate 
the impact of the field correlation and the SNR on the information rates, we further adopt the 
symmetric first order conditional autoregression (SFCAR) model, described by the conditions 

¥.{Xij\X-ij} = — (Xj+ij + Xi-ij + Xijj^i + (34) 

and 

¥.~\xl^\X_ij} = K>Q, (35) 

where < A < ^.^ Note that the parameters in (8) and (9) for this model are given by ^oo = ^) 
Oi^Q = 9-ifi = ^0,1 = ^0,-1 = ~^ fiiid all other Oij = 0. In this model, the correlation is 

(«,J + 1) 



-A 

^ ihj) 

(i-hj) • it • (i + lJ) 

-A -A 

-A 



Fig. 3 

Symmetric first order conditional autoregression model 



symmetric for each set of four neighboring nodes, as seen in Fig. 3. The SFCAR model is a 
simple yet meaningful extension of the 1-D first order autoregression (AR) model which has the 
conditional causal dependency only on the previous sample. Here in the 2-D SFCAR we have the 
conditional dependency on four neighboring nodes in the four (planar) directions. By Theorem 
1 the spectrum of the SFCAR is given by 

47r^K(l — 2Ccosa;i — 2(; cos 0^2) 
^This is a sufficient condition to satisfy (12) and (13). 
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where we define the edge dependence factor C, as 

C = < C < 1/4. (37) 

Note that for the range ofO<C^l/4 the 2-D spectrum (36) is always non- negative and the 
conditions (12) and (13) are satisfied. Note also that C = corresponds to the i.i.d. case whereas 
C = 1/4 corresponds to the perfectly correlated case, i.e., Xij = Xiiji for all Hence, 
the correlation strength can be captured in this single quantity ^ for 2-D SFCAR signals: larger 
implies stronger correlation. The power of the SFCAR signal is obtained using the inverse 
Fourier transform via the relation (6), and is given by [21] 

P = (0<ca), (38) 

where K(-) is the complete elliptic integral of the first kind. The SNR is given by 

SNR = 4 = ?^. (39) 

Using (32), (36) and (39), we now obtain the asymptotic KLI and MI rates in the SCFAR signal 
case, denoted by %s and and given in the following corollary to Corollary 1. 

Corollary 2: For the hidden 2-D SFCAR signal model the asymptotic per-node KLI DC^ is 
given by 



1 , / SNR 
log 1 + 



2 °V (2/7r)/C(4C)(l - 2Ccosu;i - 2Ccosa;2) 

1 



SNR 



(2/7r)_ft'(4C)(l-2Ccosa;i-2CcosLU2) 

and the asymptotic per-node MI is given by 



dujidu)2, (40) 



1 r r 1 ( SNR A , , 

= I. I. 2 \^ + (2/vr)ir(4C)(l-2Ccos.,-2Ccos..)J ^''^ 
Proof: The result follows upon substitution of (36) and (39) into (32) and (33), respectively. 



Note that the SNR for the hidden SFCAR model is dependent on correlation through ^ (see 
(39)). However, the SNR and correlation are separated in the expressions (40) and (41) for the 
asymptotic per-node information, which enables us to investigate the effects of each term on the 
per-sample information separately. 
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B.l Properties of the asymptotic per- node KLI and MI for the hidden SFCAR model 

First, it is readily seen from Corollary 2 that the asymptotic per-node KLI DCg and MI are 
continuously differentiable functions of the edge dependence factor C (0 ^ C ^ 1/4) for a given 
SNR since / : x — > K{x) is a continuously differentiable C°° function for < x < 1 [22]. Now 
we examine the asymptotic behavior of 3Cs and Jg as functions of (. The values of JCs at the 
extreme correlations are given by noting that the values of the complete elliptic integral at the 
two extreme correlation points 

K{0) = I and K{1) = oo. 

Therefore, in the i.i.d. case (i.e., C = 0), Corollary 2 reduces to Stein's lemma [19] as expected, 
and JCg is given by 

XAO) = l,„ga+SNR)-i(l-.^) (42) 

= Z)(AA(0,1)||AA(0,1 + SNR)). (43) 

For the perfectly correlated case (C = 1/4), on the other hand, JCg = 0. In fact, in this case as 
well as in the i.i.d. case, the two-dimensionality is irrelevant. The known result in the 1-D case 
[12] is applicable. With regard to Js, we have similar behavior at the extreme correlations. In 
the i.i.d. case, the mutual information is given by the well known formula 

J,(0) = ilog(l + SNR), (44) 

whereas we have = in the perfectly correlated case. Thus, both information measures are 
zero at perfect correlation (C = 1/4). The limiting behavior of the asymptotic information rates 
near the extreme correlation values is given by Taylor's theorem. Due to the differentiability of 
%s and 3s w.r.t. C, we have 

^K.(C) = ci-(C-l/4)+o(|C-l/4|), (45) 

and 

Js(C) = c;-(C-l/4)+o(|C-l/4|), (46) 

in a neighborhood of C = 1/4 for some constants ci and c'l as (" ^ 1/4. Similarly, we also have 
the linear limiting behavior for JCg and Jg in a neighborhood of C = with non-zero limiting 
values, D{M{0, 1)||7^(0, 1 + SNR)) and i log(l + SNR), respectively, as C ^ 0. That is, 

Xs{C)=%s{0)+C2C + o{C), (47) 
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and 



for some C2 and C2, as ^ 0. 



j,(C) = a.(o) + c'2C + o(C), 



(48) 



0.75 



.0 

i 0.65 



CD 

5 0.6 



o 0.55 

CO 



^ 0.5 



0.45 




0.038 r 



0.032 



0.028 L 



0.05 0.1 0.15 0.2 0.25 



(a) 



SNR = -3dB 




0.05 0.1 0.15 0.2 0.25 



0.11 



.0 0.1 
"S 
E 
o 

1 0.09 

'CD 

-;' 0.08 



0.07 



0.06 



0.019 



0.0185 



■| 0.0165 



M 0.016 
0.0155 



0.01 5 L 



-SNR = OdB 



0.05 0.1 0.15 0.2 0.25 



(b) 




0.05 0.1 0.15 0.2 0.25 



(c) (d) 
Fig. 4 

■Xs AS A FUNCTION OF (a) SNR = 10 dB, (b) SNR = dB, (c) SNR = -3 dB, (d) SNR = -5 

dB 



For intermediate values of correlation, we evaluate (40) and (41) for several different SNR 
values, as shown in Fig. 4. It is seen that, at high SNR, %s decreases monotonically as C, 
increases. Hence, i.i.d. observations yield the largest per-node information for a given value of 
SNR when SNR is large, as in the 1-D case [12]. As we decrease the SNR, it is seen that a second 
mode grows near (" = 1/4, i.e., in the strong correlation region. As we decrease the SNR further, 
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the value of C, of the second mode shifts toward 1/4, and the value of the second mode exceeds 
that of the i.i.d. case. Hence, there is a discontinuity in the optimal correlation as a function 
of SNR in the 2-D case even if the maximal 3Cs itself is continuous, as seen in Fig. 5. That 
is, there is a phase transition for optimal correlation w.r.t. SNR: above a certain SNR value 
i.i.d. observations yield the best performance, whereas below that SNR point suddenly strong 
correlation is preferred. This is not the case for 1-D Gauss-Markov time series, where the optimal 
correlation maximizing the information rate is continuous w.r.t. SNR. Although it is not shown 
here, the per-node MI J<j exhibits similar behavior as a function of the edge dependence factor C,. 

0.25 1 ■ 1 

0.2^ 

^ 0.15 
E 

o 0.1- 

0.05 - 

gl < < 1 < < < 

-10 -8 -6 -4 -2 2 4 

SNR [dB] 

Fig. 5 

Optimal Q maximizing vs. SNR 

With regard to "Xg and as functions of SNR, it is straightforward to see from (40) that they 
are continuously differentiable functions, and the behavior of "Xg and with respect to SNR is 
given by the following theorem. 

Theorem 3 (Per-node information vs. SNR) The asymptotic per-node KLI %s for the hidden 
SFCAR model is continuous and monotonically increasing as SNR increases for a given edge 
dependence factor C ^ [0 1/4]. Moreover, JCg increases with rate ^ log SNR as SNR oo. As 
SNR decreases to zero, on the other hand, JCg converges to zero and the rate of convergence is 
given by 

3Cs(SNR) = C3 • SNR2 + o(SNR2), (49) 
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as SNR — > 0, where C3 is given by 

1 

C3 



(50) 



26i^2(4^) J^J^ (1 _ 2CCOSW1 - 2Ccosa;2)^ 
The per- node MI dg has similar properties as a function of SNR, i.e., it is a continuous and 
monotonically increasing function of SNR. At high SNR, it increases with rate ^ log SNR, whereas 
it decreases to zero with rate of convergence 



a, (SNR) = 4 • SNR + o(SNR), 



as SNR — > 0, where Cq is given by 



1 



3 2HK{4C) 
Proof: See the Appendix I. 



1 



1 — 2^ cos ioi — 2(" cos iJ2 



-duJiduj2- 



(51) 
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Fig. 6 

Xs AND 3s AS FUNCTIONS OF SNR {( = 0.1) 



Note that the limiting behavior as SNR — > is different for DCg and J^; 3Cs decays to zero 
quadratically while ds decreases linearly. Fig. 6 shows JCs and 3s with respect to SNR for 
( = 0.1. The log SNR behavior is evident at high SNR for both information measures. Note that 
DCs and 3s increase with the same slope in the logarithmic scale with offset 1/2. This is easily 
seen from (40) and (41) because the second term in the integrand of (40) converges to -1/2, and 
thus ^ - i as SNR increases. However, the offset is negligible as SNR increases. It is easy 
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to see from (40) and (41) that for a given edge dependence factor C the convergence between the 
two information measures is characterized by = 1 + O ( logSNR ) ^ SNR oo. 

IV. Ad Hoc Sensor Networks: Fundamental Trade-Offs among Information, 

Coverage, Density and Energy 

Using the results of the previous sections, we now answer the fundamental questions, raised in 
Section I, concerning planar ad hoc sensor networks deployed over correlated random fields for 
statistical inference under the 2-D hidden SFCAR GMRF model. We first derive relevant physical 
correlation parameters for the SFCAR from the corresponding continuous- index stochastic model. 
Once the physical correlation parameters for the SFCAR are obtained, the analysis of information 
obtainable from an ad hoc sensor network and related trade-offs is straightforward. 

A. Physical Correlation Model 

We first derive how the physical correlation is related to the edge dependence factor (" in the 
2-D SFCAR model. The edge correlation coefficient p is defined as 

A 701 7lO , , IN /c-^N 

P= — = — , (0</3<l), (53) 
700 700 

due to the spatial symmetry, where 7jj = E{XQQXij}. p represents the correlation strength 
between the signal samples of two adjacent sensor nodes connected by the Markov dependence 
graph defined by the SFCAR model. The edge correlation coefficient p is obtained using the 
following relationship [21]: 

«;7oo = 1 + 4Ck7oi 701 = (54) 

and by substituting (38) and (54) into (53), we have 

(2A)K(4C) - 1 

Note that the correlation coefficient p is not dependent on the power factor k in (35), as expected, 
even though 700 and 701 are. Note that function : C ^ p is a continuous and differentiable 
function on the domain < C ^ 1/4 due to the continuous differentiability of K{x) for 
< X < 1, and 5"^!) = lim^^^i ^I^I^JkIx) = ^ = f^O) = since 

A'(0) = 7r/2. Thus, the inverse mapping g : p ^ Q from the edge correlation factor p to the edge 
dependence factor C,, which maps zero and one to zero and 1/4, respectively, behaves as shown 
in Fig. 7 (a). 

March 9, 2009 DRAFT 



TO APPEAR IN IEEE TRANS. ON INFORMATION THEORY, JUNE 2009 



21 




(a) edge dependence factor C vs. edge correlation coefficient p and (b) p vs. edge 

LENGTH dn 



Now we consider the correlation coefficient p as a function of tlie sensor spacing In general, 
the correlation function : — > /9 is a positive and monotonically decreasing function of dn with 
/i(0) = 1 and /i(oo) = 0. It is well known that for the 1-D first order AR signal a corresponding 
underlying (continuous-index) physical model is given by the Ornstein-Uhlenbeck process 

^^4^ = -Asix) + Buix), (56) 
ax 

and its discrete-time equivalent is given by 

Sj+i = asi + Uj, 

^+l ^^^^ 

a = E{s,Si_i}/E{s2} = e-^<^-, 

where A > 0, G IR, Sj = s{idn), and the input processes u{x) and Ui are zero-mean white 
Gaussian processes. Here, d„ is the spacing between two adjacent signal samples. For the 2-D 
SFCAR signal, however, the same stochastic differential equation is not applicable. Note that 
the dependence in the signal in (56) and (57) is only on the past in 1-D space, whereas the signal 
(34) has symmetric dependence in all four direction in the plane. The SFCAR signal is given by 
the solution of a second-order difference equation 

■^ij ~ C{^i+l,j + + -^ij+l + + (58) 
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and the corresponding continuous-index physical model is given by the stochastic Laplace equation 
[23]. 

+(^^ -o? X{x,y)=e{x,y), (59) 



where a (> 0) is the physical diffusion rate, and Cij and e(x, y) are 2-D white zero-mean Gaussian 
perturbations. Note that the solution of (59) is circularly symmetric, i.e., it depends only on 
r = \/ x'^ + y^, and samples of the solution X{x,y) of (59) on lattice X„ do not form a discrete- 
index SFCAR GMRF. However, (59) is still the continuous-index counterpart of (58), and we 
use its correlation function for the SFCAR model. The correlation function corresponding to 
(59) is given by [23] 

p = h{dn) = adnKi{adn), (60) 

where i^i(-) is the modified Bessel function of the second kind. Fig. 7 (b) shows the correlation 
function w.r.t. d„ for a = 1. The asymptotic behavior of 1^1(2;) is given by 

Ki{x) \f^^~^ as X ^ 00, 



(61) 

K\{x) - as X — > 0. 

The correlation function (60) can be regarded as the representative correlation in 2-D, similar to 
the exponential correlation function e~'^'^" in 1-D. Both functions decrease monotonically w.r.t. 
dn- However, the 2-D correlation function is flat at (i„ = [23], i.e., 

dp 



dd I =°' 

and it decays with rate ^/d^e~°"^" as dn ^ 00. Note that the 2-D correlation function has y/d^ 
in front of the exponential decay as dn ^ 00. However, this polynomial term is not significant 
and the exponential decay is dominant for large dn- Thus, we have C = gih{dn)), and for given 
physical parameters (with a slight abuse of notation), 

3C,(SNR,C) = 3C,(SNR,g(/i(d„))) = 3C,(SNR,d„), 

and 

J,(SNR,C) = J.(SNR,5(/i(d„))) =a,(SNR,(i„). 
We will use the arguments SNR, C and dn for JCg and properly as needed for exposition. 
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B. Scaling Laws in Ad Hoc Sensor Networks over Correlated Random Fields 

In this section, we investigate the fundamental behavior of wireless flat multi-hop ad hoc 

sensor networks deployed for statistical inference based on the 2-D hidden SFCAR model and the 

corresponding correlation functions (55) and (60). We consider several criteria for determining 

the efficiency of the sensor network. Specifically, we consider the total amount of information 

[nats] obtainable from the network and the energy efficiency of a sensor network, defined as 

total gathered information It , 

V=——l ^ ^ — [nats/J , (63) 

total requned energy 

where the gathered information is about the underlying physical process. 

In the following, we summarize the assumptions for the planar ad hoc sensor network that we 

consider. 

(A.l) sensors are located on the grid T„ with spacing d^, as shown in Fig. 2, and a fusion 
center is located at the center ([n/2j, [ri/2j). The network size is L x L, where L = ndn- Thus, 
the node density Hn on T„ is given by 

(A. 2) The observations {^ij} of sensor nodes form a 2-D hidden (discrete-index) SFCAR GMRF 
on the lattice for each (i„ > 0, and the edge dependence factor is given by the correlation functions 
(55) and (60). 

(A. 3) The fusion center gathers the measurements from all nodes using minimum hop routing. 
Note that the links in Fig. 2 are not only the Markov dependence edges but also the routing 
links. The minimum hop routing requires a hop count of |i — [n/2j | -|- |j — \n/2\ \ to deliver Yij 
to the fusion center. 

(A. 4) The communication energy per link is given by Ec{dn) = Eod'^, where > 2 is the 
propagation loss factor of the wireless channel. 

(A. 5) Sensing requires energy, and the sensing energy per node is denoted by Eg. Moreover, we 
assume that the measurement SNR in (14) is linearly increasing w.r.t. Es, i.e., SNR = f3Es for 
some constant f3. 

Remark 1: Assumption (A. 2) facilitates the analysis. Since discrete samples of a continuous- 
index GMRF do not form a discrete-index GMRF almost surely, we assume that for each d^ 
sensor samples on T„ form a discrete-index SFCAR GMRF, and match the correlation between 
two neighboring nodes with the physically meaningful correlation function (60). 

March 9, 2009 DRAFT 



TO APPEAR IN IEEE TRANS. ON INFORMATION THEORY, JUNE 2009 



24 



Remark 2: In Assumption (A. 3) we assume that there is no data fusion during the information 
gathering, i.e., no in- network data fusion. The fusion center cohects the raw measurements from 
ah sensors. 

Remark 3: We can also consider a routing graph different from the Markov dependence graph 
in Fig. 2. For example, sensors not directly connected to the transmitting node via the Markov 
dependence edge can deliver the data to the fusion center. However, this results in a reduced 
number of hops with a larger hop length, and the corresponding routing path consumes more 
energy. Thus, Assumption (A. 3) of minimum hop routing via the Markov dependence edge 
ensures least energy consumption with a minimum hop routing strategy. 

Remark 4-' Assumption (A. 5) does not imply that we can increase the power of the underlying 
signal, but it means that we can increase the SNR of effective sensor samples. Suppose that Ei 
joules are required for one sensing to obtain one sample (1) = Xij{\)+Wij{\) at location ij and 
the measurement SNR of this sample is SNRi. Now assume that we have M identical subsensors 
at location ij and obtain M samples with one sample per each subsensor, requiring M -Ei joules, 
and we take an average of M samples at location ij, yielding Yij = (1/M) X]m=i ^jl'^) where 
Yij{m) denotes the sample at the mth subsensor at location ij. The measurement SNR of the 
effective sample Yij is given by M ■ SNRi assuming that the measurement noise is i.i.d. across 
the subsensors. Thus, the effective measurement SNR at each sensor can be increased linearly 
w.r.t. the sensing energy. However, this linear SNR model is an optimistic assumption since 
the observation SNR may saturate as the sensing energy is increased without bound in practical 
situation. 

From here on, we consider various asymptotic scenarios and investigate the fundamental be- 
havior of ad hoc sensor networks deployed over correlated random fields for statistical inference 
under assumptions (A.1)-(A.5). Our asymptotic analysis in the previous sections enables us to 
calculate the total information It for large sensor networks. The total amount of information 
is given approximately by the product of the number of sensor nodes in the network and the 
asymptotic per-node information "Xg or J^? i-e.. 



for KLI or MI, respectively. The total energy Et required for data gathering via the minimum 




(65) 
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hop routing is given by 

n— 1 n— 1 

{n2^^ + ln(n-l)(n + l)£;c(dn) if n odd, 
2 V (66) 
n^-Bs + ^n^Ec{dn) if n even. 

First, we consider an infinite area model with fixed density. In this case, the number of sensor 
nodes per unit area is fixed and the total area increases without bound as we increase n. The 
behavior of the information vs. area and energy in this case is given in the following theorem. 

Theorem 4 (Fixed density and infinite area) For an ad hoc sensor network with a fixed and 
finite node density and fixed sensing energy per node, the total amount of information increases 
linearly w.r.t. area, but the amount of gathered information per unit energy decays to zero with 
rate 

rj = e (^area~i/2^ , (67) 

for any non-trivial diffusion rate a, i.e., < a < co, as we increase the area. Further, in this case 
the total amount of information obtainable from the network as a function of total consumed 
energy increases with rate of 

Total information It = Q (^E^^^^ , (68) 

for any propagation loss factor > 0, as the total energy Et consumed by the network increases 
without bound, i.e., St — > oo. 

Proof: See Appendix I. 

Theorem 4 enables us to investigate the asymptotic behavior of ad hoc sensor networks with 
fixed available energy per node. From the detection perspective the error probability is given by 

P^^e-^*(^*(^*(^))), (69) 

for large networks, where Nt{A) represents the total number of sensor nodes in the network with 
coverage area A. Now consider that each node has a fixed amount of energy denoted by E (< oo). 
Then, the total energy in the network is given by 

Et = Nt{A)E. (70) 
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Note in this case that the total energy available in the network increases linearly w.r.t. the number 
of sensor nodes. The asymptotic behavior of ad hoc networks with fixed per-node energy is given 
by the following corollary to Theorem 4. 

Corollary 3: For an ad hoc sensor network with a fixed and finite node density and fixed per- 
node sensing energy, the information amount per sensor node diminishes to zero as the network 
size grows, i.e., 



if each sensor has a finite amount of available energy. 

Proof: Substitute (68), (69) and (70) into It, Pm and Et, respectively. 

Corollary 3 states that a non-zero per-node information is not achievable as the coverage increases 
without in-network data fusion in the case that each node has only a fixed amount of energy, 
which is the case in most network design with fixed amount of battery. In this case, the per-node 

— 1/3 

information scales with 0{N^ ) as the network size grows. This result is by the communication 
energy required for ad hoc routing without in-network data fusion. Note from (66) that for the 
fixed density and increasing area model the sensing energy increases quadratically with n while 
the communication energy without in-network data fusion increases cubically with n since dn is 
fixed w.r.t. n. Hence, for ad hoc sensor networks with large coverage areas the communication 
energy dominates the sensing energy, and both the energy efficiency for information and the per- 
node information under fixed per-node energy constraint diminish to zero because of the slower 
increasing rate of the total information amount than that of the communication energy required 
for ad hoc routing without in-network data fusion. 

This diminishing energy efficiency and per-node information under fixed per-node energy con- 
straint can be fixed with in-network data fusion. Suppose that in-network data fusion is per- 
formed so that each node needs to deliver (aggregated) data only to the neighboring node along 
the minimum hop route to the fusion center in Fig. 2. In this case the number of transmission 
associated with one node is just one and the total number of transmission in the network is given 
by 0(n^). So, the communication energy as well as the sensing energy increases quadratically 
with n. Since the total amount of information also increases quadratically with n, the total 



Nt{A)-*oo Nt{A) 



\ogPM{Et{Nt{A))) 




0{Nt{A) 

■oo 




(71) 
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amount of information as a function of total energy is given, under this aggregation scenario, by 



It = &{Et), (72) 

as we increase the area, and a non-zero energy efficiency and a non-zero per-node information 
under fixed per-node energy constraint are achieved. Thus, in-network data fusion is essential 
for energy-efficiency in large sensor networks. 

Next, we consider the case in which the node density diminishes, i.e., dn — > oo. Especially, this 
case is of interest at high SNR since at high SNR less correlated samples yield larger per-node 
information, as seen in Section III-B.l. However, the per-node information is upper bounded as 
dn ^ oo, and the asymptotic behavior is given by the following theorem. 

Theorem 5: As d^ — > oo, the per-node information Xg and "Jg converge to D{M{0, 1)||A^(0, 1 + 
SNR)) and ^ log SNR, respectively, and the convergence rates are given by 

Xg{dn) = D{M{0, 1 + SNR)) - C4V^e-"'^" + o (^^/d^e-'"^"^ (73) 

and 

3s{dn) = \ log(l + SNR) - 4 v^e-°'^" + o (Vd^e^"*') , (74) 
with positive constants C4 and C4. 
Proof: See Appendix I. 

Theorem 5 explains how much gain in information is obtained from less correlated observation 
samples by making the sensor spacing larger. Fig. 8 shows the per-node KLI %s and the com- 
munication energy Ec for each link as functions of dn for a = 1, C4 = 1 and 10 dB SNR. The gain 
in information is given by \/cI^e~"'^" for large d„, whereas the required per-link communication 
energy increases without bound, i.e., Ec{dn) = E^dn (i^ > 2). Since the exponential term is 
dominant in the gain as dn increases, the information gained by increasing the sensor spacing dn 
decreases almost exponentially fast, and no significant gain is obtained by increasing the sensor 
spacing further after some point. Hence, it is not effective, in terms of energy efficiency, to 
increase the sensor spacing too much to obtain less correlated samples at high SNR. 

From Theorem 5 we have seen that increasing the sensor spacing is not so effective in terms 
of the information gain per unit of consumed energy since the per-link communication energy 
increases without bound. On the other hand, the per-link communication energy can be made 
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Fig. 8 

Per-node information and per-link communication energy w.r.t. sensor spacing dn (SNR 

= 10 dB, a = 1, C4 = 1) 



arbitrarily small by decreasing the sensor spacing. To investigate the effect of diminishing com- 
munication energy Ec as dn 0, we now consider the asymptotic case in which the node density 
goes to infinity for a fixed coverage area. In this case, the per-node information decays to zero 
as (i„ — > since C — > 1/4 as (i„ — > 0, and 3Cs(C) ^siC) converge to zero as C — > 1/4, as shown 
in Section III-B.l. The asymptotic behavior in this case is given by the following theorem. 

Theorem 6 (Infinite density model) For the infinite density model with a fixed coverage area 
S with nontrivial diffusion rate q, the per-node information decays to zero with convergence rate 

= C5H~^ + o , (75) 

for some constant C5 as the node density ^„ ^ c3o. Hence, the amount of total information from 

the coverage area converges to the constant c^S as /i„ 00. Furthermore, in the case of no 

sensing energy, a non-zero energy efficiency r] is achievable if the propagation loss factor v = 3, 

and even an infinite energy efficiency" is achievable under Assumption (A. 4) if > 3 as fin 00. 

"Of course, this is under Assumption (A. 4) for any dn > 0. In reality, Assumption (A. 4) is valid for dn > dmin 
for some dm,in > 0. 
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Jg has similar behavior. 
Proof: See Appendix I. 

Remark 5: The finite total information for the infinite density and fixed area model follows 
our intuition. The maximum information provided by the samples from the continuous-index 
random field does not exceed the information between X{x,y) and Y{x,y) except in the case of 
spatially white fields. Here, the relevance of (62) in 2-D is evident. From (62) we have 

Xs,2-D{ap{dn))) = ce-dl + o{dl), (76) 

as dn ^ since h : dn ^ C has slope zero at (i„ = and %s is a continuous and differentiable 
function of (. In the 1-D case, it is shown in [12] that OCg^i-o is also a continuous and differentiable 
function of a = e~^'^" for < a < 1 with OCs,i-D\a=i = 0. However, the exponential correlation 
^-Ad„ j^g^g ^ nonzero slope at dn = 0, and thus we have 

Xs,i-D{a{d„)) = Cq- dn + o{dn), (77) 

as dn — > 0. The number of nodes in the space is given by e(n2) and e(n) for 2-D and 1-D, 
respectively, and dn = L/n in both cases. Hence, the total amount of information from the 
coverage space (given by the product of the per-node information and the number of nodes in 
the space) converges to a constant both in 1-D and 2-D as the node density increases. Thus, any 
proper 2-D correlation function w.r.t. the sample distance should have a flat top at a distance 
of zero. 

Remark 6: It is common that the propagation loss factor z> > 3 for near field propagation (i.e., 
dn 0). Hence, infinite energy efficiency is theoretically achievable under Assumption (A. 4) 
as we increase the node density for a fixed area assuming that only communication energy is 
required. Note that the total amount of information converges to a constant as we increase the 
node density. So, the infinite energy efficiency is achieved by diminishing communication energy 
as dn — > 0. 

Remark 7: Considering the sensing energy, infinite energy efficiency is not feasible even theo- 
retically since we have in this case 

Et = n'Es + e(n3-'^), (78) 

and 

-^£2, (79) 
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as n ^ oo for fixed coverage area. In this case the sensing energy in?Es is the dominant factor 
for low energy efficiency, and the energy efficiency decreases to zero with rate O (/i^^)- Thus, 
it is critical for densely deployed sensor networks to minimize the sensing energy or processing 
energy for each sensor. 

In the infinite density model, we have observed that energy is an important factor in efficiency. 
Now we investigate the change of total information w.r.t. energy. There are many possible ways 
to invest energy in the network. One simple way is to fix the node density and coverage area 
and to increase the sensing energy. We assume that the network size is sufficiently large so that 
our asymptotic analysis is valid. The energy-asymptotic behavior in this case is given in the 
following theorem under Assumptions (A.1)-(A.5). 

Theorem 7: As we increase the total energy Et consumed by a sensor network (including both 
sensing and communication) with a fixed node density and fixed area, the total information 
increases with rate 

Total information h = O (log Et) (80) 

as Et — > oo. 

Proof: See Appendix I. 

Theorem 7 suggests a guideline for investing the excess energy. It is not efficient in terms of the 
total amount of gathered information to invest energy to improve the quality of sensed samples 
from a limited area. This only provides an increase in total information at a logarithmic rate. 
Note in Theorem 4 that the information gain is given by 

It = Q{eI") (81) 

as we increase the coverage area with fixed density and sensing energy even without in-network 
data fusion. Thus, the energy should be spent to increase the number of samples by enlarging the 
coverage area even if it yields less accurate samples. In this way, we can achieve the information 
increase with rate at least @{E^ ) which is much faster than the logarithmic increase obtained 
by increasing the sensing energy. 

C. Optimal Node Density 

In the previous section, we investigated the asymptotic behavior of the total information 
obtainable from the network and the energy efficiency as the coverage, density or energy change. 
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We now consider another important problem in sensor network design for statistical inference 
about underlying random fields, namely, the optimal density problem. Here, we are given a fixed 
coverage area, and are interested in determining an optimal node density. The total amount of 
information gathered from the network increases monotonically (even if it has an upperbound) 
as we increase the node density, as shown in Theorem 6. Hence, the problem cannot be properly 
formulated without some constraint. We consider a total energy constraint in which a fixed 
amount of energy is available to the entire network for both sensing and communication. Thus, 
we consider the following problem. 

Problem 1 (Optimal density) Given a fixed coverage area with size L x L and total available 
energy Et, find the density /x„ that maximizes the total information It obtainable from the sensor 
network. 

The above optimization problem can be solved using our analysis based on the large deviations 
principle assuming the asymptotic result is still valid in the low density case, and the optimal 
density for the KLI measure is given by 

/x; = argmax LVn3C.(SNR(^t, /x„), d„(/x„)), (82) 

/in 

S.t. n^Es{^in) + \n{n - l){n + l)E,{dn{^ln)) < Et, (83) 

where the sensing energy Eg as well as n and d„ are functions of the node density From 
(= /L"^), we first calculate n and then (i„ = L/n. (Here, the quantization of n to the nearest 
integer is not performed.) With the determined dn, Eddn) is obtained from the propagation 
parameters Eq and u, and then Es{nn) is obtained from the constraint (83). When Es{fJ,n) 
is determined, the measurement SNR is calculated using Assumption (A. 5), i.e., SNR = (3Es, 
and finally we evaluate the per-node information 3<!s(SNR, (^{p{dn))) and Js(SNR, ({p{dn))) from 
Corollary 2. 

Fig. 9 shows the total information obtainable from a 2 meter x 2 meter area as we change the 
node density pn with a fixed total energy budget of Et joules. Other parameters that we use are 
given by 

a = 100, (3 = 1, Eo = 0.1 and v = 2. 

Here, the values of Et, Eq and (3 are selected so that the minimum and maximum per-node 
sensing SNRs are roughly -10 to 10 dB for maximum and minimum densities, respectively. The 
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(a) total KLI vs. density and (b) total mi vs. density 



diffusion rate a = 100 is chosen for the edge correlation coefficient p to range from almost zero 
to 0.6 as the node density varies. It is seen in the figure that there is an optimal density for each 
value of Et under either information measure. It is also seen that the total KLI is sensitive to 
the density change whereas the total MI is less sensitive. The existence of the optimal density 
is explained as follows. At low densities, we have only a few sensors in the area. So, the energy 
for communication is not large due to the small number of communicating nodes (see (108) 
below) and most of the energy is allocated to sensing. Here, the per-node sensing energy is 
even higher due to the small number of sensors. However, the per-node information increases 
only logarithmically w.r.t. the sensing energy or SNR by Theorem 7, and this logarithmic gain 
cannot compensate for the loss in the number of sensors. Hence, low density yields very poor 
performance, and large gain is obtained initially as we increase the density from very low values, 
as seen in Fig. 9. As we further increase the density, on the other hand, the per-node sensing 
energy or SNR decreases due to the increase in the overall communication and the increase in 
the number of sensor nodes, and the measurement SNR is in the low SNR regime eventually, 
where (49) and (51) hold. From (66), we have 

Es{yin) = /?"'SNR = 0(n-2) (84) 
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for fixed Et and Ec = Eq^L/h)'^, as n ^ cxd. By the quadratic decaying behavior of Xs at low 
SNR given by (49), the total Kullback-Leibler information is given by 

Total KLI = LVn^Cs = ©(n^n"^) = 0{n~^) = 0{^i~^). 

By (51), on the other hand, the mutual information decays linearly as SNR decreases to zero, 
and the total mutual information is given by 

Total MI = LVn^s = 0{r?n-'^) = 0(1). 

This explains the initial fast decay after the peak in Fig. 9 (a) and flat curve in Fig. 9 (b). In 
the above equations, however, the effect of <^ on 3Cs and Js is not considered. As the node density 
increases, the sensor spacing decreases and the edge dependence factor C, increases for a given 
diffusion rate a. The behavior of the per-node information as a function of Q is shown in Fig. 4. 
Note in Fig. 4 that the per-node information has a second lobe at strong correlation at low SNR 
while at high SNR it decreases monotonically as the correlation becomes strong. The benefit 
of sample correlation is evident in the low energy case {Et = 50 [J]) in 9 (a); the second peak 
around /i„ = 95 [nodes/m^] is observed. Note that the second peak is not very significant. Since 
the per-node information decays to zero as C ^ 1/4 eventually, the total amount of information 
decreases eventually, as seen in the right corner of the figure, as we increases the node density 
further. 

V. Conclusion and Discussion 

In this paper, we have considered the design of sensor networks for statistical inference about 
correlated random fields in a 2-D setting. To quantify the information from the sensor network, 
we have used a spectral domain approach to derive closed-form expressions for asymptotic KLI 
and MI rates in general d-D and in 2-D in particular, and have adopted the 2-D hidden CAR 
GMRF for our signal model to capture the spatial correlation and measurement noise for samples 
in a 2-D sensor field. Under the first order symmetry assumption, we have further obtained the 
asymptotic information rates explicitly in terms of the SNR and the edge dependence factor, and 
have investigated the properties of the asymptotic information rates as functions of SNR and 
correlation. Based on these LDP results, we have then analyzed the asymptotic behavior of ad 
hoc sensor networks deployed over 2-D correlated random fields for statistical inference. Under 
the SFCAR GMRF model, we have obtained fundamental scaling laws for total information and 
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energy efficiency as the coverage, node density and consumed energy change. The resuhs provide 
guidelines for sensor network design for statistical inference about 2-D correlated random fields 
such as temperature, humidity, or density of a gas on a certain area. 

In closing, we discuss several issues related to some of the assumptions we have used to simplify 
our analysis. First, of course, sensors in a real network may not be located on a 2-D grid. 
However, we conjecture that similar scaling behaviors w.r.t. the coverage, density and energy 
are valid for randomly and uniformly deployed sensors. Secondly, the spatial Markov assumption 
may be restrictive. However, it is a minimal model that captures the two dimensionality of the 
signal correlation structure in all planar directions and allows analysis to be tractable. And, 
finally we have not considered the temporal evolution of the spatial signal field. In case of i.i.d. 
temporal variation, the results here can be applied directly without modification. When the 
signal variation over time is correlated, the modification to spatio-temporal fields is required. 

Appendix I 

Proof of Theorem 2 
The asymptotic KLI rate % is given by the almost-sure limit 

3C = lim log ^({Fi, i G P4), (85) 

evaluated under pQ [24]. We consider the following index mapping from d-D to 1-D in lexico- 
graphic order: 

^ = /*d(i), (iG [0,l,---,n-l]'^), (86) 

and the corresponding observation vector y|x)„| generated from {Yj, i G T>n}- Then, y\T>n\ ^ 
zero-mean Gaussian vector with the covariance matrices Sojx)„| and Sli |x)„| under po and pi, 
respectively. Hence, the asymptotic KLI rate is given by 

1 /I detfSi lo i) 1 ^ 1 1 \ 

3C = lim -— -log . L '' ' + -y^ i(S7L I - 5]~i-n i)yi© I , (87) 
n^oo\Vn\\2 ^ det(I]o,|D„|) 2-^l^"l^ o.lDnl^J'li^niy ' V ; 

under po- Now we consider the terms on the RHS of (87). First, we consider logdet(5]o,|x)„|). 
Since 5]o,|x)„| = o'^In'^ under the assumption of an i.i.d. null distribution, we simply have 

log det 5]o,p„| = \ log det(c72l„,) = log . (88) 

Next we consider the term j^^yp |^o p \y\'Dn\- Since yixi^l is i.i.d. Gaussian, d-D is irrelevant 
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in this case, the known result from [25, Proposition 10.8.3] is apphcable, and we have 

1 



I '-'^n I 

assuming that the random vector yp^| is generated from the distribution po- Now we consider the 
term logdet S;^ This is the entropy rate of a d-Y) Gaussian process, and the convergence 
behavior of this term is studied in [18]. It is shown in [18, p. 391] under the assumption in 
Theorem 2 that we have 



logdetSi^l^,^! 

Applying this result, we have 
1 



\Vn\ 

{2-kY 



O 



l^nl 



n 



log detSi 



1 



log((27r)'^/i(^))da;. 



(90) 



Finally, we consider the random term j^^yj^ i^^jx) |y|x>„|-** By Lemma 2 in Appendix II, we 
have 

almost surely as n — > oo. 

Combining (87) - (91), we have 



1 



{2-kY J[~n,7T)d 



a 



1 i2.rMu) _i 

2 ^ a2 2 I (27r)rf/i(a;: 



Since 



1 / ^ cr, 



2 


2 / ' 



(92) is given by 



X 



DmO,al)mO,al)) = -\og^ 

DiMiO,a^)\\MiO,i27r)^h{u)))du. 



{2ttY 



-7r,7r)" 



(91) 



(92) 

(93) 
(94) 



Proof of Corollary 1 
For the 2-D hidden model we have 

/i(a;i,W2) = (27r)-V2 + /(a;i,W2), (95) 
**The proof given in [25] and [26] for the convergence of this term for the 1-D index case is not apphcable for 
general d-D, nor is the almost-sure convergence of the term shown in [18], where the convergence of the term in 
probabihty to an integral involving the periodogram was shown. Thus, we prove the almost-sure convergence of 
the term in Lemma 2 separately in Appendix II. 
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where f{uji^uj2) is the CAR spectrum (11) in 2-D satisfying (12) and (13). First, /i(wi,lj2) has 
a positive lower bound, and thus satisfies Assumption A.l in Theorem 2. It is also known in [27] 
that if k = (/ci, • • • , k^) G N*^ and if /i(<^) is of class (i-e., differentiable up to the /c^-order 
w.r.t. ujci)i then 

lim sup hll' • • • /i^'' 1 7h I < oo , (96) 

h— >oo 

where N is the set of all natural numbers, and h — > oo means that at least one coordinate tends 
to infinity. Under the condition (12) and (13), the hidden CAR spectrum fi{L0i,uj2) in (95) is 
(j(oo,oo)^ i.e., smooth both in uJi and liJ2- This ensures that Assumption A. 2 in Theorem 2 is 
satisfied, and the corollary follows by substituting (95) and d = 2 into (26). I 

Proof of Theorem 3 

The continuity is straightforward. The monotonicity is shown as follows. Let s = 1 + SNR(7^(aj) 
where gc_{<^) = ((2/'7r)-fr(4^)(l — 2(^coswi — 2CcosliJ2))~^- Then, the partial derivative of %s 
w.r.t. SNR is given by 

dX, If 9/1, 1 1\ 9s , 



where 



and 



9SNR (27r)2 7^g[_^^^)2 as V2 2s 2 J dS^R 



(9/1 1 1\ 1 s - 1 lSNRgc(<^) . . 



ds 

9d^) > (99) 



aSNR 
for < C < 1/4. Hence, 



> 0, (100) 



aSNR 

and 3C<, increases monotonically as SNR increases for a given C, < a < 1/4). 
As SNR ^ oo, we have 

^ 7^ / Jlog(SNR5c(cc.))(icc;, 

= ilogSNR+-i-/ \\o^{gc^{u:))du. 

Thus, we have ^ log SNR behavior at high SNR. 

For (49) and (51), take the Taylor expansion around SNR = to obtain 

log(l + SNR5^(w)) = SNRg^u) -SNR^gl{u)/2 + ---, 

March 9, 2009 DRAFT 



TO APPEAR IN IEEE TRANS. ON INFORMATION THEORY, JUNE 2009 37 

and then integrate. I 
Proof of Theorem 4 

In this case, the edge length dn = d for all n, and thus the asymptotic per-sensor information 
%s{dn) or Js((i„) does not change with n. Considering the Kullback-Leibler information, we have 
It = v?'Xs{d), and area = 0(n^). Hence, the total information is linear w.r.t. area. The total 
energy Et required for data gathering is given by 

n— 1 n— 1 

Et = n''Es + E,{d)Y,Y.^\i-[n/2\\ + \j-[n/2\\), 

i=0 jr=0 

= v?Es + Q{n^)Ec{d), (101) 

where the first term is the sensing energy and the second term is the energy consumed for 
communication. The energy efficiency is given by 

^ " n^Es + e{n^)E,{d) ~ \nj ' ^ 

as n ^ oo. Since area = Q{n?), (67) follows. 

For the second statement we have Et = @{n^). The total information is given by n^IK<j(SNR, d). 
Since Xs is fixed, the total information is 0(n^) as n — > oo, and we have (68). I 

Proof of Theorem 5 

The proof is by the asymptotic behavior of the modified Bessel function i^i(-) of the second kind 
and Taylor expansion of Xs (as a function of (") and C (as a function of p), which is allowed 
because of their continuous differentiability. Prom (60) and (61) we have 



^adne-""^" + o (adne"""^") (103) 



as — > cxo. From the continuous differentiability of Xg as a function of C in (47) and C as a 
function of p, we have 

Xs = Z)(AA(0,1)||AA(0,1 + SNR)) -C2C + o(C), 

= D{M{0, 1)1 |AA(0, 1 + SNR)) - C2(C7P + o{p)) + o{c7p + o{p)), 
= Z)(AA(0,1)||AA(0,1 + SNR)) -C2C7p + o(p), 
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for some C2,ci > 0. Applying (103) to the above equation, we have (73). The proof for the 
mutual information "Jg is similar. ■ 

Proof of Theorem 6 

Consider a fixed area with size L x L and a lattice 2„ on it. The sensor spacing dn for n is given 
by 

dn = -. 

n 

By (62), we have 

p{dn) = l + cs-dl + o{dl) (104) 

for some constant cg. By the continuous differentiability of %s (as a function of Q) and C, (as a 
function of p), we have 

C=^+C9-(1-P)+0((1-P)'), 

and 

= ci-(C-l/4)+o(C-l/4), 
for some constant cg. Substituting (104) into the above equations gives 

%s = cwdl + o{dl), (105) 

for some constant cio- The node density is given by 

Tip' 

f^n = J2=d~'^- (106) 

Substituting (106) into (105) yields (75). The total amount of information per unit area is given 
by 

M„aC, = C5 + o(l), (107) 

and it converges to cs as n — > oo. 

To calculate the energy efficiency, we first calculate the total communication energy consumed 
by the minimum hop routing, given by 

n— 1 n— 1 

E', = E,(d„)5^5^(|z-Ln/2j| + |j-Ln/2j|), 

1=0 j=0 

= e{n^)E,{dn) = EoL^n-^Qirv"), 

= e(n3-^), (108) 
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as n — > oo (i.e., /i^ — > oo). Here, E[ denotes the total energy considering only the communication 
energy. The energy efficiency in this case is given by 

77' = ^^ [nats/ J/m^]. (109) 

Applying (107) and (108) to the above equation, we have the claims. ■ 

Proof of Theorem 7 
Note that 

Et = n'^Es + Q{n^)Ec^{dn). 

In this case, n and are fixed, and Theorem 3 is directly applicable. Since the number of nodes 
and communication energy are fixed, the sensing energy increases linearly with the total energy 
Et- By Assumption (A. 5), the measurement SNR increases linearly with the sensing energy. 
Applying Theorem 3 yields (80). ■ 

Appendix II 

To prove Lemma 2 (this will be stated below), we briefly introduce some relevant preliminary 
results. 

Definition 5 (Matrix norms [18,28]) Let A be an n x n matrix with singular value decompo- 
sition 

n 

A = USV^ = J^SiUivf, (110) 

1=1 

where U and V are unitary matrices with columns Uj and Vj, respectively, and S = diag(si, S2, • • • , Sn) 
with nonnegative elements si > S2 > ■ ■ ■ Sn ^ 0. The operator norm of || A|| is defined as 

||A|| = si = sup||Ax||/||x||, (111) 

where ||x|| denotes the 2-norm of x. On the other hand, the trace class norm of A is defined as 

||A||i=J]si. (112) 

i 

Note that if A is a symmetric matrix with eigenvalues {Aj}, then 

l|A||i = ^|A,|. (113) 
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Remark 8 (The covariance matrix and its circulant approximation) Using vector notation, the 
covariance matrix of the vector y\T>n\ ™- (29) under pi is given by 

Si,|i,„|=Ei{y|^„|y^^|} = [a^-i(j)^^-iy)], a ^-.^j-,^ = ^..^^ i,j G Pn, (114) 

where 7^ is defined in (23) and fid is defined in (86). With slight abuse of notation, we use o"ij 
for fTr-v-N f-ii;\ for the sake of exposition. 

The circulant approximation C|x)„| to is obtained by treating as a high dimensional 

torus with opposite ends being neighbors, and C|x)„| is given by 

C|23„| = [Cij], Cij = 77r(i-j), iJ e 'In, (115) 

where the mapping vr : 

^ is defined as 

7r(h) = 7r(/ii, h2,---,hd) = {h[,h'2, ■■■,h'a), (116) 

and 

h'k = hkI{\hk\<n/2) + {n-\hk\)I{\hk\>n/2), A: = 1, • • • , d.tt (117) 

Here, /(•) is the indicator function. Note that is a block Toeplitz matrix, while Cp^i is a 

block circulant matrix. It is known that the eigenvalues of the block circulant matrix Cp^i are 
given by 

Ai= 5] 7.(h)e'^-'^S (118) 

for i = (ii , • • • , id) G Pn, where 

/27rii 27ri2 27rirf\ 

c^i = {uJi^,uJi^,---,uJiJ = , . (119 

\ n n n J 

Define the periodic approximate spectral density by 

f^{u) = (2vr)-'^ Yl 7.(h)e^'^-^. (120) 

hGl?„ 

Then, the eigenvalues of C|x)„| are given by 

Ai = {27r)^f^{u:i), i e Vn. (121) 

^^The distinction of even and odd n will not be considered for simplicity, as this is merely a technical issue. In 
either case, the asymptotic behavior is the same. 
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Further, it is shown in [18, Lemma 4.1.(c)] that the periodic approximate spectral density con- 
verges miiformly to the true spectral density i.e.. 



sup |/;j(^)-/i(a;)|^0. 



as n — > oo. 



(122) 



Lemma 1: Under the assumption of Theorem 2, we have 
(a) fni^) is uniformly continuous for sufficiently large n. 
(b) 

1 1 

► as n — > oo. 



sup 



(123) 



fflH /i(^) 

(c) l//^(a^) is uniformly continuous for sufficiently large n. 
Proof of Lemma 1 

(a) By assumption, fi{u^) is continuous on the compact domain [— vr, vr]'*. By the uniform conti- 
nuity theorem, fi{u:) is uniformly continuous. For any e > 0, \\(^ — (^'\\ < S imples 

\f^{u:)-f^^{u')\ < |/;j(^)-/i(^)+/i(u;)-A(u;')+/i(^')-/n(^0|, 

< |/;^(u;) - /i(c^)| + \f,iu) - f,iu:')\ + |/i(cu') - f;^iu% 

< e/3 + e/3 + e/3, 

for sufficiently large n. The convergence of the first and third terms is by (122) and that of the 
second term is by the uniform continuity of /i(ct'). 

(b) Since the spectrum fi{uj) has a positive lower bound by assumption, its inverse l//i(cj) is 
bounded from above. In addition, due to (122) there exists Mi > such that 



< Ml and , < Mi , 



/i(^) 



for all uj S [— TT, tt)*^ and for sufficiently large n. Then, for any e > 



1 1 




1 1 









|/;j(a;)-/i(cu)|. 



(124) 

(125) 
(126) 



for all uj G [— 7r,7r)'^ and for sufficiently large n, by (122) and (124). 
(c) For any e > 0, ||cj — cj'H < 6 implies 



1 



1 



< 



1 



1 1 

+ 



1 1 

+ 



1 



ffiic.) fi{u) fr{u) f,{u') fr{u') fl{u') 
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< 



1 



f'n{^) /l(^) 

< e/3 + e/3 + e/3, 



+ 



1 



1 



/i(^) 



+ 



1 



for sufficiently large n. The convergence of the first and third terms is by (123) and that of 
the second term is by the uniform continuity of (The uniform continuity of 1/ fi{u) is 

obvious due to the uniform continuity and strict positivity of /i(a;).) ■ 



Lemma 2: Under the conditions of Theorem 2, we have 



1 



'y\v„\'^i,\v„\y\T^n 



1 



doj, 



almost surely. 

Proof of Lemma 2 
First, it is shown in [18, Lemma 4.1. (a)] that 



l^l,|D„| - 



o 



n 



(127) 



as n ^ oo. Let {A|x)„|(«), i = 1, 2, be the eigenvalues of ^(I]i |x)„| — C|x)„|), where 

\Vn\ = n'^ for d-D. Then, by (113) and (127) we have 

(128) 



Eiv.i«i = o(^)- 



Since the convergence of the eigenvalues of the block Toeplitz matrix p^^i and its block 
circulant approximation C|x)„| is uniform (The eigenvalues of these matrices are the samples 
of the corresponding spectra for sufficiently large n; see (121) and (122).), miuj |A|x)„| (i)| and 
maxj |A|x)„|(«)| have the same convergence rate, i.e., there exist M2, and r„ such that 



M2r„ < min|A|x, |(i)| < max|A|x, |(i)| < Mar^. 

i i 

By (128) and (129) we have 



O 



1 



n 



d+l 



(129) 



(130) 



Since the spectra /i(u;) and fni^) have positive lower bounds by assumption, their inverses 
l//i(cj) and l//^(cj) are bounded from above. Hence, the eigenvalues of 5]^^^ | and | are 
bounded from above since the eigenvalues of these matrices are the samples of the corresponding 
inverse spectra for sufficiently large n, and thus we have 



^UvJ\<Mi and ||CrJ^|||<Mi 



(131) 
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for all sufficiently large n. 

Now consider the error between two quadratic terms. 



(a) 



1 T 1 111 T 1 

Wn\ yp^|S^_|^^|yp„| - \Dn\~ y|^^|C^^|yp„ 



i=l 



^ ^ i=l 



(d) 



a.s. 



(132) 



for some C > 0. Here, step (a) is by (131) and the definition of the trace class norm (113), step 
(b) is by (129), and step (c) is by (130). Step (d) is by the strong law of large numbers (SLLN) 
on the sample mean of yf. Since {yi\ is i.i.d. AA(0, a^) under pQ, -i^ X^^Li uf almost surely. 

Thus, the quadratic form using the block circulant approximation converges almost surely to 
that based on the true covariance matrix. 

We next consider the asymptotic behavior of |P„|~^yp [yiVnl- Since C|x)„| is a block 

circulant matrix, the eigendecomposition is given by [29,30] 



H 



W|i,„|Ap„|W|^^|, 



(133) 



where Wp^j is the d-dimensional discrete Fourier transform (DFT) matrix which is unitary, and 



= diag(Ao,...,o, • • • , A„_i,...,„_i). 



(134) 



The inverse of C\T>n\ given by 



Define 



CZ} , = Wm iA7j: ,Wf^ 



y\v„\ = ,y|^^|. 



(135) 



(136) 
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Then, y|x)„| is a vector of i.i.d. Gaussian random variables since Wp^i is unitary and yp^l is a 
vector with i.i.d. Gaussian elements under po- Thus, |D„|~^yp^|C|^^ |y|x)„| is given by 

1 T ]_ 111 T 1 

5'n = l^^nP y|x,„|Cp^|yp„l = |2?„r y|p^^|A^^|yp„|, 

^ n— 1 n— 1 T>2 

= ;^E---E>r^' (138) 

n=0 jd=0 

where {Y\, i G is i.i.d. zero-mean Gaussian with variance cj^. For sufficiently large n, fix K 
{0 < K < n) and divide the indices of each dimension such that 

T= [0, ,n- 1] =1(0) UI{1) U ••• J(A' - 1), 
T{i) n T{j) = (j) if i ^ j, and 

|T(0)| = • • • = \I{K - 2| = [n/K\ , - 1)\ = n - (K - 1)|T(0)|. 

Then, (138) is given by 

K-l K-1 ( y.2 \ 

Now let ii, • • • , • • • ,jd) denote the index representing the center of the (ji, • • • ,jdY^ hyper- 
cube. Then, by (121) we have 

111 , , 

(140) 



and 



K,-MiwM (27r)^/n(^j)' 

/27rji 27rjrf\ 
ujj = {ujj„- ■ • = ( • • • , 1 , (141) 



i2nrffi{ui) - K,..,, - (27r)rf/^(a;j) 
for all (ii, • • • ,id) in the (ji, • • • , j^)*'' hypercube. Here, e' (> 0) is independent of (ji, • • • ,jrf) 

since l/f'^{u) is uniformly continuous over u E [— vr, vr)'^ by Lemma 1 (c). Applying (142) to 

(139), we have 

yn-^dY.^?<Sn<V^ + ^,Y.n. (143) 



where 

A" K 



^" ^'^ ■ ■ ■ (2vr)'^ /;j(^j) I |T(ii)| • • • \l{u)\ I/. , ' ■ ^^^^^ 
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By the SLLN for the sample mean of Y-^ , we have 

almost surely for sufficiently large n given K. Thus, Vn is given by 

{a^ - e")Zr, <Vn< {a^ + e")Zn, (146) 

where 

^ K K ^ 

^- = :pE---E {27rY fcJ^uj-X ^^^^^ 
ji=i id=i ^ ^ -/"^ 

Now we take -fC — > oo, and the Riemann sum Zn converges to 

„ 1 /" 1 



du (148) 



(149) 



" (2^)^^ y„[,,,). (2^)'^/i(a;) 
by Lemma 1 (b) and (c). Since e' and e" can be made arbitrarily small by making n and K large, 
^^■^^ (2^ /-[tt.tt)'' (2^71^^'^ < ^4 for some M4 > and n-'^X^jgp^^ Yi a.s., we have by 

(143), (146) and (148), that 

almost surely as n — > 00. By (132) and (149) we have 

almost surely as n — > cx). This concludes the proof. I 
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